::::::::::::::::::::::::::::::::::::::::::::::::::::
CONFUCIUS SAY: A wise man changes his mind often; a fool, never! With a little help from Janet, I'm becoming a pretty smart cookie! Move over Einstein!
Previously we discussed simple methods to store data, input it to the program, process the data in some way, and display it on the screen (or output to a file), without going into the special techniques which make up dBase programming. We also learned READ/RESTORE statements do not provide flexible data files, and while data strings are simple to use and maintain, they waste significant amounts of memory. A more convenient way to organize data is through the use of delimited text files.
The most commonly used method, certainly not the only method, is a text file with data fields separated by commas, called comma separated values and saved as a CSV file. CSV files are still text files, but each data field ends with a comma, even if it is an empty field.
When creating a data base, the programmer can choose how to delimit the data file. Some programmers prefer the horizontal tab to create a tab delimited file. Others may prefer a \ (back-slash) or semi-colon. A comma may not be the best choice if you are constructing an address file, because many addresses may include an apartment or building number (or both), usually separated with a comma. This limitation can be avoided if you provide additional data fields or use a different character to mark the limits of the data field. Any character not used in your data can serve as the delimiter. You declare the delimiter for your program with the INPUTTO$(#file, "¿"), where 'file' is the handle and ALT+0191 is the character used as a delimiter. (Works pretty well, except in Latin countries.) [Editor's note: ¿ (ALT+0191) can be printed within your program as Chr$(191)].
Because there are no iron-clad industry standards, the choice is yours. Comma delimited files are used widely across the Internet, and may be opened in nearly all spread-sheet applications, though a little fine tuning may be needed. CSV files can also be opened for editing with any text processor, Word Pad or Note Pad being just a few.
Previously we used a text file with data records consisting of a single string, containing 6 data fields, to store each of 20 records (plus an index string), for the Fancy Coin Collector's Shop. The file size of coins.xyz was 1,003 bytes. In this example, I used the same data, but saved as coins.csv, which required only 700 bytes. I created the data for coins.csv using Word Pad and entering the same information we used for a Six-Pak of Data, but it was not necessary to use redundant spaces to align the data fields in a delimited file.
As a courtesy, many programmers will include an index string as the first line of the CSV or other delimited file. When read by the program, this usually becomes myArray(0), or myArray$(0), and may or may not be used while processing. Here is the index string and twenty records used in this example. You should copy and paste this data as coins.csv, and save it to the folder where you will save the program we work with this time. If you compare this file to the coins.xyz file, you'll find the only difference is the removal of uneeded spaces, commas are used to separate all data fields, and a decimal was included with the first data field allowing LB to recognize it as a numerical value. Without the decimal point, the native SORT command will order the data according to ASCII values and you'll find 25¢ listed after $1.00, which is not our intention.
val,date,mint$,qual$,loc$,note$ .50,1863,New Orleans,FINE,A05,Civil War 1.00,1864,Charlotte,FAIR,A02,Civil War .25,2002,West Point,UN,B03,CT .25,2002,West Point,UN,B05,DE .05,1902,West Point,VFINE,B08,Indian .50,1965,West Point,UN,A07,JFK 1.00,1921,San Francisco,UN,B05,Liberty .25,2003,Philadelphia,UN,B03,MA .25,2003,Philadelphia,UN,B04,VT .10,1947,Denver,EXFINE,C10, 1.00,1895,Carson City,VFINE,A03, .05,1898,Philadelphia,FINE,G08, 1.00,1898,San Francisco,VFINE,G02, .01,1906,Philadelphia,UN,E09, .01,1912,West Point,EXFINE,E08, .25,1917,Denver,VFINE,B06, 1.00,1918,Carson City,EXFINE,B05, .10,1944,Philadelphia,VFINE,G02, .05,1950,Denver,VFINE,C06, .10,1956,Philadelphia,UN,F04,
Why do some lines end with a comma while others do not? Because the limits of every data field in this file are marked with a comma. Some coins have a remark following the loc$ (the row and shelf within our shop), and other coins do not. Even if we don't have any information to put into the note$ field, the field must be included to keep the data properly organized for input and processing. (A missing data field, or an extra one, will surely generate an input error at runtime!)
To work with a data file, we need an application program, and one or more data files for that program. Word Pad is an application to work with text files. The program we are working with now will be CoinShop.bas, and this is my code:
'The Fancy Coin Collector's Shop Inventory
'Sort coins.csv using an array with two dimensions
'Written by Welo
OPEN "coins.csv" for INPUT as #1
LINE INPUT #1, thisLine$
nFields=1 'Initialize the field counter at 1.
'How many data nFields are in the line?
WHILE WORD$(thisLine$, nFields, ",") <> ""
nFields=nFields+1 'Increment nFields by 1.
WEND
CLOSE #1
'How many records are in the data file?
OPEN "coins.csv" for INPUT as #1
WHILE EOF(#1)=0
LINE INPUT #1, thisLine$
nRecords=nRecords+1
WEND
CLOSE #1
'Dimension the array for nRecords, nFields
DIM myArray$(nRecords-1, nFields-1) 'Index string will be element
zero.
OPEN "coins.csv" for INPUT as #1
FOR i=0 to nRecords-1 'Get the records including the index.
FOR j=0 to 5 'Get the individual data nFields.
INPUT #1, myArray$(i,j)
NEXT j
NEXT i
CLOSE #1
[makeChoice]
CLS
PRINT "There are ";nRecords-1;" records in the file."
PRINT "There are ";nFields-1;" data fields in each record."
PRINT
PRINT "How you would like the data sorted:"
PRINT TAB(5); "1 = Sort by value of coin."
PRINT TAB(5); "2 = Sort by date on coin."
PRINT TAB(5); "3 = Sort by location of US Mint."
PRINT TAB(5); "4 = Sort by condition of coin."
PRINT TAB(5); "5 = Sort by location within shop."
PRINT TAB(5); "6 = Sort by coins remarks."
PRINT
INPUT "Please make your selection... "; UR$ 'Get user request.
UR=INT(VAL(UR$)) 'Check for invalid entries.
IF UR > 6 OR UR < 1 THEN NOTICE " Knucklehead Response.";CHR$(13); _
"Please try again." : GOTO [makeChoice] 'Give the user another try.
CLS
begin=TIME$("ms") 'Start time for sort
SORT myArray$(), 1, nRecords-1, UR-1 'Sort by user request
finish=TIME$("ms") 'End time for sort
PRINT "VALUE"; TAB(8);"DATE";TAB(15);"MINT";TAB(30);"COND.";TAB(40);"
LOC.";TAB(46);"NOTE"
FOR i= 1 to nRecords-1 'Don't include the index, myArray$(0,0).
PRINT myArray$(i,0);
PRINT TAB(8); myArray$(i,1);
PRINT TAB(15); myArray$(i,2);
PRINT TAB(30); myArray$(i,3);
PRINT TAB(40); myArray$(i,4);
PRINT TAB(46); myArray$(i,5)
NEXT i
PRINT
PRINT "The data collection was sorted in "; finish-begin; "
milliseconds."
INPUT "Do another sort? (Y/N) "; UR$ 'Get another response from user.
IF LEFT$(UR$, 1)="Y" or LEFT$(UR$, 1)="y" _
THEN [makeChoice]
CLS 'Executes only if not performing another operation.
END
After copying and saving your CoinShop.Bas file to the LB files, if you receive a compile error the first time you attempt to run it, check the right/left margins of each line. When pasting a BAS file from a web page, some lines may be wrapped with the logical line continuation (_) in the wrong position and are not interpreted by LB as a continuous line.
When working with a CSV or other delimited file, open it with any spread sheet application to see if the author has included an index to tell us what the information in the data fields represents, and how many data fields are used for each record. (Remember, a record contains all the variables related to an item.) You can also open the data file with a text editor and count the data fields. Because this file is labeled coins.csv, I can be reasonably certain the author (me!) used commas to mark the data fields.
The first thing I've done with my program is open the CSV file, input a single line of data, then count the WORD$ in that line separated by commas. This determines both how many data fields are in each record and the value to be used for the second dimension of my data array.
Wait a minute! There are six data fields but only five commas? How does this work? The program inputs the first value as myArray$(0,0), the next value will be myArray$(0,1), and so on, until we have 6 data fields, because 0 to 5 = 6 valid elements.
Next we count the lines in the CSV file, easily done because every line of a CSV or text file ends with a line feed, CHRS(13), and is read by the computer each time we use a LINE INPUT statement. With a WHILE/WEND loop and a counter to increment nRecords each time through the loop before encountering the EOF marker, we learn how many records are in the file.
But my counter says there are 21 records! When I try to DIMension my array as myArray$(21,6), the program bombs with an "Input past end of file" error.
Of course it does! You forgot to account for myArray$(0,0) which contains the first data record. Your DIM statement should be, DIM myArray$(nRecords-1,nFields-1) because the counters will always increment to the next value before encountering EOF(#1) and exiting the loop. Now myArray$ neatly holds all your data without those annoying runtime errors. I used DIM myArray$(nRecords-1,nFields-1) to declare a two-dimensional array. This is not a six-dimensional array! It has only one dimension of rows, and a second dimension of 6 columns. You can't expect the program to output lemonade if you input oranges!
Now that we know the proper dimensions of the array, we can use two nested loops to load the data into all elements of myArray$(i,j) and the program will not crash.
Oops! You're loaded and you crashed? (Better lay off the Bud!) You received an "INPUT past end," error? Is it possible, when you copied and pasted the data to a text file, a single trailing comma was not copied, or a line was broken when auto-wrapped? This will definitely interfere with the sequence of your loaded data.
Another common error, which happens all too frequently, is the invisible line feed. Experienced typists may unconsciously press ENTER one or more times at the end of a text file when editing. Your program has counted those line feeds, assumed them to represent the existence of another data record and is searching for non-existent data for INPUT. When viewing your delimited file in a text editor, the blinking cursor should always end on the line below the last line of data, and immediately under the first character of that line. If not… you are suffering from invisible data errors which will bomb your program every time. Fortunately, invisible file errors are easier to detect than a flu virus and can be cured without a visit to the doctor's office.
After the data has been successfully entered into myArray$, the user is presented with an options screen to select which value will be used to order the data. Some versions of basic do not include a native SORT command and you will have to write your own code. Liberty Basic has a nifty little SORT command which we can use for any type of data. If you are considering writing your own algorithm and want to see an interesting comparison of sort methods, download and inspect [sort_willie_lee.zip], Bill Beasley's open source code of eight sorting algorithms.
Sorting with the native SORT command is remarkably fast. I ran 15 sorts of 1,000 data records and the average sort was completed in 15.5 milliseconds, while none took longer than 32 milliseconds.
In CoinShop.bas, the user has six choices to display the data file. The program checks to see if our user is playing with us and attempting to make an invalid choice. If so, the user is awarded the "Knucklehead Response" prize and given a chance to try again.
After the inventory of the coin shop has been sorted, I printed it to the main window using a single FOR/NEXT loop with 6 print statements. I did this only to take advantage of the PRINT TAB() function, which is valid for the main window but invalid if you are printing to another file. If updating the coins.csv file, any PRINT loop could be used.
Of course, sorting is not the only thing you will do with data, if you even bother to sort at all. On most occasions you will want to perform some operation with the data.
So far we have done nothing with this data except display it in the main window. If our only need is to keep an eye on our inventory and its location within our shop, this works just fine. Stay tuned to this channel when we get into future subjects, such as A User Friendly GUI to Maintain the Data Base, actually doing work with a data file, and an introduction to random access files for simple ways to do nerdy things things with data.
DEMO
The files coinshop.bas and coins.csv are included in the zipped archive of this newsletter.
::::::::::::::::::::::::::::::::::::::::::::::::::::